Two Notes on Notation 

by Donald E. Knuth 
Computer Science Department, Stanford University 

Mathematical notation evolves like all languages do. As new experiments are made, we some- 
times witness the survival of the fittest, sometimes the survival of the most familiar. A healthy 
conservatism keeps things from changing too rapidly; a healthy radicalism keeps things in tune with 
new theoretical emphases. Our mathematical language continues to improve, just as "the c?-ism of 
Leibniz overtook the dotage of Newton" in past centuries [4, Chapter 4] . 

In 1970 I began teaching a class at Stanford University entitled Concrete Mathematics. The 
students and I studied how to manipulate formulas in continuous and discrete mathematics, and 
the problems we investigated were often inspired by new developments in computer science. As the 
years went by we began to see that a few changes in notational traditions would greatly facilitate 
our work. The notes from that class have recently been published in a book [15], and as I wrote 
the final drafts of that book I learned to my surprise that two of the notations we had been using 
were considerably more useful than I had previously realized. The ideas "clicked" so well, in fact, 
that I've decided to write this article, blatantly attempting to promote these notations among the 
mathematicians who have no use for [15]. I hope that within five years everybody will be able to 
use these notations in published papers without needing to explain what they mean. 

The notations I'm talking about are (1) Iverson's convention for characteristic functions; and 
(2) the "right" notation for Stirling numbers, at last. 

1. Iverson's convention. The first notational development I want to discuss was introduced by 
Kenneth E. Iverson in the early 60s, on page 11 of the pioneering book [21] that led to his well 
known APL. 

"If a and (3 are arbitrary entities and is any relation defined on them, the relational 
statement (a"3lb) is a logical variable which is true (equal to 1) if and only if a stands in 
the relation 31 to (3. For example, if x is any real number, then the function 

(x > 0) - (x < 0) 

(commonly called the sign function or sgn x) assumes the values 1, 0, or —1 according as 
x is strictly positive, 0, or strictly negative." 

When I read that, long ago, I found it mildly interesting but not especially significant. I began 
using his convention informally but infrequently, in class discussions and in private notes. I allowed 
it to slip, undefined, into an obscure corner of one of my books (see page 117 of [16]). But when 
I prepared the final manuscript of [15], I began to notice that Iverson's idea led to substantial 
improvements in exposition and in technique. 

Before I can explain why the notation now works so well for me, I need to say a few words 
about the manipulation of sums and summands. I realized long ago that "boundary conditions" 
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on indices of summation are often a handicap and a waste of time. Instead of writing 



k=0 

it is much better to write 



(i+*r = E(fc)* fc ; ( L2 ) 

the sum now extends over all integers k, but only finitely many terms are nonzero. The second 
formula (1.2) is instantly converted to other forms: 

<> + *r - E (I) ^ - E - E ( Ln/2 " _ t ) ^" /2J -' ; <"> 

by contrast, we must work harder when dealing with (1.1), because we have to think about the 
limits: 

Furthermore, (1.2) and (1.3) make sense also when n is not a positive integer. 

Even when limits are necessary, it is best to keep them as simple as possible. For example, it's 
almost always a mistake to write 



n-l 



Jfc(fe-l)(n-Jfe) instead of ^ k(k - l)(n - k) ; (1.5) 

fc=2 fc=0 

the additional zero terms are more helpful than harmful (and the former sum is problematical when 
n = 0, 1, or 2). 

Finally it dawned on me that Iverson's convention allows us to write any sum as an infinite 
sum without limits: If P(k) is any property of the integer k, we have 

E/w = E/( fc )[ p wi- (1-6) 

p(fc) fc 
For example, the sums in (1.5) become 

E k(k - l)(n - fc) [0 < fc < n] = ^ ~ x )( n " fc ) > 0] [&<"]■ (1-7) 

fc k 

(At the time I made this observation, I had forgotten that Iverson originally defined his conven- 
tion only for single relational operators enclosed in parentheses; I began to put arbitrary logical 
statements in square brackets, and to assume that this would produce the value or 1.) In this 
particular case nothing much has been gained when passing from (1.5) to (1.7), although we might 
be able to make use of identities like 



k [k > 0] = k [k > 1] . 
2 



(1.8) 



But in general, the ability to manipulate "on the line" instead of "below the line" turns out to be 
a great advantage. 

For example, in my first book [25] I had found it necessary to include the rule 

E/w + E/( fc ) = E /(*) + E /(*) (i-^) 

k€A k€B k€Al)B keAnB 

as a separate axiom for ^ manipulation. But this axiom is unnecessary in [15], because it can be 
derived easily from other basic laws: The left-hand side is 

E /(*) + E /(*) = E /(*) [*m+E /(*) t fc G B ^ 

k€A keB k k 

= J2f(k)([keA] + [keB}) 

k 

and the right-hand side is the same, because we have 

[k G A] + [k € B] = [k € A U B] + [k € A n B] . (1.10) 

The interchange of summation order in multiple sums also comes out simpler now. I used to 
have trouble understanding and/or explaining why 

n j n n 

EE/(^) = EE/t^); ( LU ) 

j=l k=l k=l j=k 

but now it's easy for me to see that the left-hand sum is 

^2f(j,k) [1 <j < n] [1 < k < j] =^f(j,k) [l<k<j<n] 

j,k j,k 

= E f& k )[ l < k < n ]\ k <3 < n \> 

and this is the right-hand sum. 

Here's another example: We have 

[k even] = E ^ = 2m ] and ^ k odd l = ^[k = 2m + 1] ; (1.12) 

m m 

E/( fc ) = E ev H + [fe odd]) 

k k 

= E f(k) [k = 2m] + E /(*0 [k = 2m + 1] 

fc,m fc,m 

= E/( 2 H + E/( 2m+1 )- ( L13 ) 



therefore 



The result in (1.13) is hardly surprising; but I like to have mechanical operations like this available 
so that I can do manipulations reliably, without thinking. Then I'm less apt to make mistakes. 
Let lg stand for logarithms to base 2. Then we have 

§G;.j) -§?(:) 

= ^2 ( U ) [m < Igk < m + l][k > 1} 

k,m 

= E (I) l* m <k< 2 m+1 ] [k > 1] 

m,k ^ ' 

= E CO ( 2m+1 - 2m ) [™ ^ °] 



If we are doing infinite products we can use Iversonian brackets as exponents: 

n/«=n^) [p(fc)i - ( li5 ) 

P(k) k 

For example, the largest squarefree divisor of n is 

F"J p ^ P r ™ e ] tP divides n] 



Everybody is familiar with one special case of an Iverson-like convention, the "Kronecker delta" 
symbol 

f 1 j i = k: , 
{ " = {o, l + i < LW > 

Leopold Kronecker introduced this notation in his work on bilinear forms [30, page 276] and in his 
lectures on determinants (see [31, page 316]); it soon became widespread. Many of his followers 
wrote 5j, which is a bit more ambiguous because it conflicts with ordinary exponentiation. I now 
prefer to write [j = k] instead of Sjk, because Iverson's convention is much more general. Although 
'[? = involves five written characters instead of the three in l 8jk\ we lose nothing in common 
cases when l [j = k + 1]' takes the place of 'dj^+i). 

Another familiar example of a 0-1 function, this time from continuous mathematics, is Oliver 
Heaviside's unit step function [x > 0]. (See [44] and [37] for expositions of Heaviside's methods.) It 
is clear that Iverson's convention will be as useful with integration as it is with summation, perhaps 
even more so. I have not yet explored this in detail, because [15] deals mostly with sums. 

It's interesting to look back into the history of mathematics and see how there was a craving 
for such notations before they existed. For example, an Italian count named Guglielmo Libri 
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published several papers in the 1830s concerning properties of the function 0° x . He noted [32] 
that X is either (if x > 0) or 1 (if x = 0) or oo (if x < 0), hence 

0°* =[x>0]. (1.17) 

But of course he didn't have Iverson's convention to work with; he was pleased to discover a way 
to denote the discontinuous function [x > 0] without leaving the realm of operations acceptable in 
his day. He believed that "la fonction 0° est d'un grand usage dans Panalyse mathematique." 
And he noted in [33] that his formulas "ne renferment aucune notation nouvelle. . . . Les formules 
qu'on obtient de cette maniere sont tres simples, et rentrent dans Palgebre ordinaire." 
Libri wrote, for example, 

(1-0° )(l-0° ) 
for the function [0 < x < a], and he gave the integral formula 

2 P dg cos gx . _ , = g 

ir J Q l + q 2 V ; 0-x + 1 0* + 1 

(Of course, we would now write the value of that integral as e - ^', but a simple notation for 
absolute value wasn't introduced until many years later. I believe that the first appearance of l \z\' 
for absolute value in Crelle's journal — the journal containing Libri's papers [32] and [33] — occurred 
on page 227 of [56] in 1881. Karl Weierstrass was the inventor of this notation, which was applied 
at first only to complex numbers; Weierstrass seems to have published it first in 1876 [55].) 

Libri applied his 0° function to number theory by exhibiting a complicated way to describe 
the fact that x is a divisor of m. In essence, he gave the following recursive formulation: Let 
P (x) = 1 and for k > let 

n x — fe , > n x — k-\-l , . f)X — 1 , 

P k (x) = 0° P (x)-0° Pi(x) 0° P fc -i(x). 

Then the quantity 

1 _ m . tf x - m p ( x ) - ( m - 1) (F^PAx) 2 • 0° x - 2 P m _ 2 (x) - OP^Prn-iix) 

X 

turns out to equal 1 if x divides m, otherwise it is 0. (One way to prove this, Iverson-wise, is 
to replace O * in Libri's formulas by [x > k], and to show first by induction that Pk(x) = 
[x divides k] — [x divides k — 1] for all k > 0. Then if dk(x) = k [x > k] , we have 

m—1 m — 1 

a m _ k {x)P k {x) = a m-k(x) ([x divides k] — [x divides k — 1]) 

k=0 k=0 
m — 1 

= ^ t x divides k \ (a m -k(x) - a m - k -i{x)) . 

k=0 

If the positive integer x is not a divisor of m, the terms of this new sum are zero except when 
m — k = m mod x, when we have a m -k(x) — a m -k-i(x) = 1. On the other hand if x is a divisor of m, 
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the only nonvanishing term occurs for m — k = x, when we have a m -k{x) — a m -k-\{x) = — (x — 1). 
Hence the sum is 1— x [x divides m]. Libri obtained his complicated formula by a less direct method, 
applying Newton's identities to compute the sum of the mth powers of the roots of the equation 
i*" 1 + t x ~ 2 + • ■ ■ + 1 = 0.) 

Evidently Libri's main purpose was to show that unlikely functions can be expressed in alge- 
braic terms, somewhat as we might wish to show that some complex functions can be computed 
by a Turing Machine. "Give me the function 0°*, and I'll give you an expression for [x divides m]." 
But our goal with Iverson's notation is, by contrast, to find a simple and natural way to express 
quantities that help us solve problems. If we need a function that is 1 if and only if x divides m, 
we can now write [x divides m] . 

Some of Libri's papers are still well remembered, but [32] and [33] are not. I found no mention 
of them in Science Citation Index, after searching through all years of that index available in our 
library (1955 to date). However, the paper [33] did produce several ripples in mathematical waters 
when it originally appeared, because it stirred up a controversy about whether 0° is defined. Most 
mathematicians agreed that 0° = 1, but Cauchy [5, page 70] had listed 0° together with other 
expressions like 0/0 and oo — oo in a table of undefined forms. Libri's justification for the equation 
0° = 1 was far from convincing, and a commentator who signed his name simply "S" rose to 
the attack [45]. August Mobius [36] defended Libri, by presenting his former professor's reason 
for believing that 0° = 1 (basically a proof that lim ;i: _ > o+ x ' x = !)• Mobius also went further and 
presented a supposed proof that lim x ^ + f{x) 9<yX "> = 1 whenever lim x ^o+ f{ x ) = li m z:^o+ g( x ) = 0. 
Of course "S" then asked [3] whether Mobius knew about functions such as f(x) = e~ x l x and 
g(x) = x. (And paper [36] was quietly omitted from the historical record when the collected works 
of Mobius were ultimately published.) The debate stopped there, apparently with the conclusion 
that 0° should be undefined. 

But no, no, ten thousand times no! Anybody who wants the binomial theorem 

( x +y) n = E {^ k ) xk y n ~ k (i-is) 

to hold for at least one nonnegative integer n must believe that 0° = 1, for we can plug in x = 
and y = 1 to get 1 on the left and 0° on the right. 

The number of mappings from the empty set to the empty set is 0°. It has to be 1. 

On the other hand, Cauchy had good reason to consider 0° as an undefined limiting form, in 
the sense that the limiting value of f{x) 9 ^ is not known a priori when f(x) and g(x) approach 
independently. In this much stronger sense, the value of 0° is less defined than, say, the value of 
+ 0. Both Cauchy and Libri were right, but Libri and his defenders did not understand why truth 
was on their side. 

Well, it's instructive to study mathematical history and to observe how tastes change as 
progress is made. But let's come closer to the present, to see how Iverson's convention might 
be useful nowadays. Today's mathematical literature is, in fact, filled with instances where analogs 
of Iversonian brackets are being used — but the concepts must be expressed in a roundabout way, 
because his convention is not yet established. Here are two examples that I happened to notice 
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just before writing this paper: 

(1) Hardy and Wright, in the course of proving the Staudt-Clausen theorem about the denom- 
inators of Bernoulli numbers [20, § 7.9], consider the sum 

£ 1 

p—l divides k 

where p runs through primes. They define ek{p) to be 1 if p — 1 divides k, otherwise 6k(p) = 0; 
then the sum becomes 

ejfc(p) 



p 

v 

They proceed to show that Y^n=i mk = ~ e k(p) (mod p) whenever p is prime, and the theorem 
follows with a bit more manipulation. 

(2) Mark Kac, introducing the relation of ergodic theory to continued fractions [24, § 5.4], 
says: "Let now Pq £ and g(P) the characteristic function of the measurable set A; i.e., 

9 (p)=r- p€A 

\o, peA 

It is now clear that t(r, Po, A) is given by the formula 

t(r,P ,A)= / g(T t (P ))dt, 
Jo 

and ... ". 

I hope it is now clear why my students and I would find it quite natural to say directly that 

t(r,P ,A) = f [T t (P )eA]dt. 
Jo 

Also, in the context of Hardy and Wright, we would evaluate mk ^j m °d V an d discover 

that it is (p — 1) [p — 1 divides k]. 

If you are a typical hard-working, conscientious mathematician, interested in clear exposition 
and sound reasoning — and I like to include myself as a member of that set — then your experiences 
with Iverson's convention may well go through several stages, just as mine did. First, I learned about 
the idea, and it certainly seemed straightforward enough. Second, I decided to use it informally 
while solving problems. At this stage it seemed too easy to write just '[A; > 0]'; my natural tendency 
was to write something like l 5(k > 0)', giving an implicit bow to Kronecker, or 'r(k > 0)' where 
r stands for truth. Adriano Garsia, similarly, decided to write '%(£: > 0)', knowing that x often 
denotes a characteristic function; he has used \ notation effectively in dozens of papers, beginning 
with [10], and quite a few other mathematicians have begun to follow his lead. (Garsia was one of 
my professors in graduate school, and I recently showed him the first draft of this note. He replied, 
"My definition from the very start was 

1 if .A is true 



A is false 
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where A is any statement whatever. But just like you, I got it by generalizing from Iverson's APL. 
... I don't have to tell you the magic that the use of the x notation can do." ) 

If you go through the stages I did, however, you'll soon tire of writing S, r, or %, when you 
recognize that the notation is quite unambiguous without an additional symbol. Then you will 
have arrived at the philosophical position adopted by Iverson when he wrote [21]. And I had also 
reached that stage when I completed the first edition of [15]; I adopted Iverson's original suggestion 
to enclose logical statements in ordinary parentheses, not square brackets. 

Unfortunately, not all was well with that first edition. Students found cases where I had 
parenthesized a complicated logical statement for clarity, for example when I wrote something of 
the form 'a and (f3 or 7)'; they pointed out that the simple act of putting parentheses around l [3 
or 7' automatically caused it to be evaluated as either or 1, according to a strict interpretation 
of Iverson's rule as I had extended it. 

Worse yet, as I began to read the first edition of [15] with fresh eyes, I found that the formulas 
involved too many parentheses. It was hard for me to perceive the structure of complex expressions 
that involved Iversonian statements; the statements had been clear to me when I wrote them down, 
but they looked confusing when I came back to them several months later. A computer could readily 
parse each expression, but good notation must be engineered for human beings. 

Therefore in the second and subsequent printings of [15], my co-authors and I now use square 
brackets instead of parentheses, whenever we wish to transform logical statements into the values 
or 1. This resolves both problems, and we now believe that the notation has proved itself well 
enough to be thrust upon the world. Square brackets are used also for other purposes, but not in 
a conflicting way, and not so often that the multiple uses become confusing. 

One small glitch remains: We want to be able to write things like 

\p prime] [p < x]/p (1-19) 

v 

to denote the sum of all reciprocals of primes < x. But this summand unfortunately reduces to 0/0 
when p = 0. In general, when an Iverson-bracketed statement is false, we want it to evaluate into 
a "very strong 0," namely a zero so strong that it annihilates anything it is multiplied by — even if 
that other factor is undefined. 

Similarly, in formulas like (1.2) it is convenient to regard (^) as strongly zero when k is negative, 
so that, for example, (_™ )z -10 = when z = 0. 

The strong-zero convention is enough to handle 99% of the difficult situations, but we may also 
be using 1 — [P(k)] to stand for the quantity [not P(k)]; then we want [-P(fc)] to give a "strong 1." 
And paradoxes can still arise, whenever irresistible forces meet immovable objects. (What happens 
if a strong zero appears in the denominator? And so on.) 

In spite of these potential problems in extreme cases, Iverson's convention works beautifully in 
the vast majority of applications. It is, in fact, far less dangerous than most of the other notations 
of mathematics, whose dark corners we have learned to avoid long ago. The safe use of Iverson's 
simple and convenient idea is quite easy to learn. 

2. Stirling numbers. The second plea I wish to make for perspicuous notation concerns the 
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famous coefficients introduced by James Stirling at the beginning of his Methodus Differentialis in 
f730 [52]. The lack of a widely accepted way to refer to these numbers has become almost scan- 
dalous. For example, Goldberg, Newman, and Haynsworth begin their chapter on Combinatorial 
Analysis in the NBS Handbook [1] by remarking that notations for Stirling numbers "have never 
been standardized . . . We feel that a capital S is natural for Stirling numbers of the first kind; it is 
infrequently used for other notation in this context. But once it is used we have difficulty finding 
a suitable symbol for Stirling numbers of the second kind. The numbers are sufficiently important 
to warrant a special and easily recognizable symbol, and yet that symbol must be easy to write. 
We have settled on a script capital S without any certainty that we have settled this question 
permanently." 

The present predicament came about because Stirling numbers are indeed important enough 
to have arisen in a wide variety of applications, yet they are not quite important enough to have 
deserved a prominent place in the most influential textbooks of mathematics. Therefore they have 
been rediscovered many times, and each author has chosen a notation that was optimized for one 
particular application. 

The great utility of Stirling numbers has become clearer and clearer with time, and mathe- 
maticians have now reached a stage where we can intelligently choose a notation that will serve us 
well in the whole range of applications. 

I came into the picture rather late, having never heard of Stirling numbers until after re- 
ceiving my Ph.D. in mathematics. But I soon encountered them as I was beginning to analyze 
the performance of algorithms and to write the manuscript for my books on The Art of Com- 
puter Programming. I quickly realized the truth of Imanuel Marx's comment that "these numbers 
have similarities with the binomial coefficients (^); indeed, formulas similar to those known for 
the binomial coefficients are easily established" [35]. In order to emphasize those similarities and 
to facilitate pattern recognition when manipulating formulas, Marx recommended using bracket 
symbols [?] for Stirling numbers of the first kind and brace symbols {^} for Stirling numbers of 
the second kind. A similar proposal was being made at about the same time in Italy by Antonio 
S aimer i [46]. 

I was strongly motivated by Charles Jordan's book, Calculus of Finite Differences [23], which 
introduced me to the important analogies between sums of factorial powers and integrals of ordinary 
powers. But I kept getting mixed up when I tried to use Stirling numbers as he defined them, 
because half of his "first kind" numbers were negative and the other half were positive. I had 
similar problems with Marx's suggestions in [35]; he made all Stirling numbers of the first kind 
positive, but then he attached a minus sign to half the numbers of the second kind. I decided that 
I'd never be able to keep my head above water unless I worked with Stirling numbers that were 
entirely signless. 

And I soon learned that the signless Stirling numbers have important combinatorial signif- 
icance. So I decided to try a definition that combined the best qualities of the other notations 
I'd seen; I defined the quantities [£] and {^} as follows: 

[£] = the number of permutations of n objects having k cycles; 
{^} = the number of partitions of n objects into k nonempty subsets. 
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For example, [^] = 11, because there are eleven different ways to arrange four elements into two 
cycles: 

[1,2, 3] [4] [1,2,4] [3] [1,3,4] [2] [2,3,4] [1] 
[1,3,2] [4] [1,4, 2] [3] [1,4,3] [2] [2,4,3][1] 
[1,2] [3, 4] [1,3] [2,4] [1,4] [2,3]. 

And {2} = 7, because the partitions of {1,2,3,4} into two subsets are 



{1,2,3}{4} 
{1,2}{3,4} 



{1,2,4}{3} 
{M}{2,4} 



{1,3,4}{2} 
{1,4}{2,3}. 



{2,3,4}{1} 



Notice that this notation is mnemonic: The meaning of {?} is easily remembered, because braces 
{ } are commonly used to denote sets and subsets. We could also adopt the convention of writing 
cycles in brackets, as in my examples above, where [1,2,3] = [2,3,1] = [3,1,2] is a typical three- 
cycle; that would make the notation equally mnemonic. But I don't insist on this. 

I have never decided how to pronounce '[£]' and '{£}' when I'm reading formulas aloud in 
class. Many people have begun to verbalize ' (£) ' as "n choose fc" ; hence I've been saying "n cycle k" 
for [£] and "n subset fc" for {^}. But I have also caught myself calling them "n bracket fc" and 
"n brace fc." 

One of the advantages of these notational conventions is that binomial coefficients and Stirling 
numbers can be defined by very simple recurrence relations having a nice pattern: 



n + l 

k 

n + 1 

k 

n+l 

k 



n 



+ 



n 



n 



k) \k- 1J ' 
+ 



n 
k-1 



n 



k<n k\ + \k-l 



(2.1) 
(2.2) 
(2.3) 



Moreover — and this is extremely important — these identities hold for all integers n and k, whether 
positive, negative, or zero. Therefore we can apply them in the midst of any formula (for example, 
to "absorb" an n or a k that appears in the context n [£] or without worrying about 

exceptional circumstances of any kind. 

I introduced these notations in the first edition of my first book [25], and by now my students 
and I have accumulated some 25 years of experience with them; the conventions have served us well. 
However, such brackets and braces have still not become widely enough adopted that they could be 
considered "standard." For example, Stanley's magnificent book on Enumerative Combinatorics 
[51] uses c(n,k) for [™] and S(n,k) for {"}. His notation conveys combinatorial significance, but 
it fails to suggest the analogies to binomial coefficients that prove helpful in manipulations. Such 
analogies were evidently not important enough in his mind to warrant an extravagant two-line 
notation — although he does use ((^)) to denote ( n+ fc _1 ) = (— l) fe the number of combinations 
with repetitions permitted. (In a sense, Stanley's ((£)) is a signless version of the numbers (~ fe n ).) 

When I wrote Concrete Mathematics in 1988, I explored Stirling numbers more carefully than 
I had ever done before, and I learned two things that really clinch the argument for [£] and {^} as 
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the best possible Stirling number notations. Ron Graham sent me a preview copy of a memorandum 
by B. F. Logan [34], which presented a number of interesting connections between Stirling numbers 
and other mathematical quantities. One of the first things that caught my attention was Logan's 
Table 1, a two-dimensional array that contained the numbers [^] and {^} simultaneously — implying 
that there really is only one "kind" of Stirling number. Indeed, when I translated Logan's results 
into my own favorite notation, I was astonished to find that his arrangement of numbers was 
equivalent to a beautiful and easily remembered law of duality, 





~—k~ 


{:}- 


—n 



(2.4) 



Once I had this clue, it was easy to check that the recurrence relations (2.2) and (2.3) are equivalent 
to each other. And the boundary conditions 

= {o} = [n = 0] (2.5) 

yield unique solutions to (2.2) and (2.3) for all integers k and n, when we run the recurrences 
forward and backward; the "negative" region for Stirling numbers of one kind turns out to contain 
precisely the numbers of the other kind. For example, the following subset of Logan's table gives 
the values of [£] when |n| and \k\ are at most 4: 





k = -4 


k = -3 


k = -2 


k = -1 


k = 


k = 1 


k = 2 


k = 3 


k = 


-4 


1 


























-3 


6 


1 























-2 


7 


3 


1 




















-1 


1 


1 


1 


1 
































1 














1 

















1 











2 

















1 


1 








3 

















2 


3 


1 





4 

















6 


11 


6 


1 



The reflection of this matrix about a 45° diagonal gives the values of {"} = ["J]. 

Naturally I wondered how I could have been working with Stirling numbers for so many years 
without having been aware of such a basic fact. Surely it must have been known before? After 
several hours of searching in the library, I learned that identity (2.4) had indeed been known, 
but largely forgotten by succeeding generations of mathematicians, primarily because previous 
notations for Stirling numbers made it impossible to state the identity in such a memorable form. 
These investigations also turned up several things about the history of Stirling numbers that I had 
not previously realized. 

During the nineteenth century, Stirling's connection with these numbers had been almost 
entirely forgotten. The numbers themselves were studied, in the role of "sums of products of 



= jjH = [fc = 0] and 
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combinations of the numbers {1,2, . . . ,n} taken A; at a time." Let Ck(n) and Tk(n) denote those 
sums, when the combinations are respectively without or with repetitions; thus, for example, 

C 4 (4) = l- 2- 3 + l- 2- 4+ l- 3- 4 + 2- 3- 4 = 50; 
r 3 (3) = 1 ■ 1 ■ 1 + 1 ■ 1 ■ 2 + 1 ■ 1 ■ 3 + 1 ■ 2 ■ 2 + 1 ■ 2 ■ 3 

+ l- 3- 3 + 2- 2- 2 + 2- 2- 3 + 2- 3- 3 + 3- 3- 3 = 90. 



It turns out that 

and T k {n) = { n + k \. (2.6) 



C k {n) 



n+l 
n + 1 — k 



n 



Christian Kramp [28] proved near the end of the eighteenth century that 



c*(") = E (ll i) zm tjwtjw, ■ <") 



k + l) ji!2J'i j 2 !3^ 2 j 3 !4J3 ... 

r ^ n ) = E (fc+J) , ; !2!r^5!^! 1!'. ... ' (2 " 8) 

where the sums are over all sequences of nonnegative integers (j\ , ji , jz , ■ ■ ■ ) such that we have 
ji + 2j2 + 3j3 H = (i.e., over all partitions of k), and where I = ji + j% + jz H • For example, 

_ , , /n + A 1 (n + 1\ 1 ^ . . /n + 2\ 1 /n + 2\ 1 
C2(n)= ( 4 JsH 3 J3 ; r2(n)= ( 4 JsH 3 Jo" 

Notice that C\.(n) and T fc (n) are polynomials in n, of degree 2fc. The duality law (2.4) and the 
notational transformations of (2.6) are equivalent to the amazing polynomial identity 

C fc (n-1) = T fc (-n); (2.9) 

but hardly anybody was aware of this surpising fact, otherwise we would almost certainly find it 
mentioned explicitly in the comprehensive surveys compiled in the 1890s [19, 38]. 

On the other hand, a rereading of Stirling's original treatment [52] makes it clear that Stirling 
himself would not have found the duality law (2.4) at all surprising. From the very beginning, he 
thought of the numbers as two triangles hooked together in tandem. Indeed, his entire motivation 
for studying them was the general identity 

*" = E{fcU» (2-10) 
k L J 

which expresses ordinary powers in terms of falling factorial powers. When n is positive, the nonzero 
terms in this sum occur for positive values of k < n; but when n is negative, the nonzero terms 
occur for negative k < n. Stirling presented his tables by displaying {^} with k as the row index 
and [™] with k as the column index; thus, he visualized a tandem arrangement exactly as in the 
matrix of numbers above, with each column containing a sequence of coefficients for (2.10). 

I need to digress a bit about factorial powers. If n is a positive integer and z is a complex 
number, I like to write 

z*. = z (z -1) ... (z-n+1), (2.11) 
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which I call u z to the n falling," and 



z n = z{z + l) ... {z + n-1), (2.12) 

which is u z to the n rising." More generally, if a is any complex number, factorial powers are 
defined by 

z s. = z \/{z-a)\ and / = T{z + a)/T{z) , (2.13) 

unless these formulas reduce to oo/oo (when limiting values are used). My use of underlined and 
overlined exponents is still controversial, but I cannot resist mentioning a curious fact: Many people 
(e.g., specialists in hypergeometric series) have become accustomed to the notation {z) n for rising 
factorial powers, while many other people (e.g., statisticians) use the same notation for falling 
powers. The curious fact is that this notation is called "Pochhammer's symbol," but Pochhammer 
himself [43] used (z) n to stand for the binomial coefficient (*). I prefer the underline/over line 
notation because it is unambiguous and mnemonic, especially when I'm doing work that involves 
factorial powers of both kinds. (Moreover, I know that z— and z n are easy to typeset, using macros 
available in the file gkpmac.tex in the standard UNIX distribution of TjjjX.) 

In the special case n = 3, Stirling's formula (2.10) gives 

z 3 = {l)zl + { 3 \z± + { S \z± = Z (z-l)( Z -2) + 3z(z-l) + z. 



»3J l 2 J U, 

And in the special case n = —1, it reduces to the infinite sum 



£ 



A- 



0! 1! 2! , 

+ 7 TT7 ST + 7 TT7 SV7 + " " " > ( 2 - 14 ) 



z+l (z + l)(z + 2) (z + l)(z + 2)(z + S) 



because 



= (n- l)![n > 0]. (2.15) 
(Stirling did not discuss convergence; he was, after all, writing in 1730. We have the partial sum 



1 n 
7=E 



(fc-1)! n! 



^ {z + l) ... (z + k) z{z + 1) . . . {z + n) ' 
this is a special case of the general identity 

1 = ^•••^-i + £i_^ (2 16) 

Z ^ (2 + ^l) • • • (Z + Z k ) Z{Z + Zi) ... {Z + Z n ) 

discovered by Frangois Nicole [39] a few years before Stirling's treatise appeared. Therefore the 
infinite series (2.14) converges if and only if Re(z) > 0. By induction on n, the same condition is 
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necessary and sufficient for (2.10) when n is any negative integer. See [41, § 30] for further discussion 
of (2.10).) 

We noted above that the numbers [™] can be regarded as sums of products of combinations. 
The first identity in (2.6) is equivalent to the formula 



n 



(2.17) 



when n is a nonnegative integer, if we expand the product z n and sum the coefficients of each power 
of z. Similarly, we have 



(-i) 



n — k J* 



(2.18) 



These equations are valid also when n is a negative integer; in that case both infinite series converge 
for \z\ > \n\. Notice that (2.10) and (2.18) tell us how to convert back and forth between ordinary 
powers and factorial powers. 

Let's turn now to the nineteenth century. Kramp [29] decided to explore a slightly generalized 
type of factorial power, for which he used the notations 



a n \ r =a(a + r) ... (a + (n-l)r) 
a~ n \ r = l/(o - r)(a - 2r) ... (a - nr) 
when n is a positive integer. Then he considered the expansion 

a n \ r = a n + ntl.a n - 1 r + nt2.a n - 2 r 2 + -- - , 



(2.19) 
(2.20) 

(2.21) 



where the coefficients ntm are independent of a and r [29, §§539-540]; thus, ntm was his notation 
for [ n " m ] . He obtained [29, § 557] a series of formulas equivalent to 



m 



n 

n — m 



m—l 

E 

k=0 



n — k 
m + 1 — k 



n 

n — k 



(2.22) 



thereby giving a new proof that [ ™ ] is a polynomial in n of degree 2m. This proof, independent 
of his earlier formulas (2.7) and (2.8), works for both positive and negative values of n. 

Kramp implicitly understood the duality principle (2.4), in the sense that he regarded the 
coefficients [?] and {^} as the positive and negative portions of a doubly infinite array of numbers. 
In fact, he assumed that equation (2.21) would hold for arbitrary real values of n. He differentiated 
a x \ r with respect to x and gave formal derivations of several interesting series. However, his 
expansion (2.21) is equivalent to 



E 



n 

n — k 



^n — k 



(2.23) 



(a slight variation of (2.17)), and this series is not always convergent for noninteger n. We can 
show, for example, that 



1/2 
1/2 - k 



> k\/7 k for infinitely many k ; 



(2.24) 
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hence (2.23) diverges for all z when n = 1/2. Kramp lived before the days when convergence of 
infinite series was understood. (See [29, § 574], where he says that the divergent series J2k>o BkD k /k 
is "tres convergente pour peu que y soit une petite fraction"!) 

Several other nineteenth-century authors developed the theory of factorial powers, notably 
Andreas von Ettingshausen [6], Ludwig Schlafli [41, 48], and Oskar Schlomilch [49], who used the 
respective notations 

n n n 

Fm j A m , and C m 

for the coefficients [ ™ ] • All of these authors considered both positive and negative integers n. 
Thus, for example, Ettingshausen's notation for a Stirling number such as | n l m ) = [_~™ 1 was 

L. ft J L ft tibJ 

— n 

F 

(see [6, § 151]). 

Incidentally, these works of Kramp and Ettingshausen proved to be important in the history of 
mathematical notations. Kramp's book introduced the notation n! for factorials [29, pages V and 
219], and Ettingshausen's book introduced the notation for binomial coefficients [6, page 30]. 
Ettingshausen wrote his book shortly after Fourier [8] had invented ^-notation for sums; Ettings- 
hausen tried a German variation, writing 6^ b for what has evolved into Y^k=a- ^ e a ^ so wr °t e 
(a,r) n for Kramp's a n ' r ; thus, for example, Ettingshausen [6, § 153 and § 156] gave the equations 

w n r —n+r 

(a,d) n = 6F w a n ~ w d w and a n = & (-l) r F r (a,d) n ~ r d r 

as equivalents of Kramp's (2.21) and Stirling's (2.10). He presented Kramp's (2.22) in the form 

«■ ™ ( n — w \ « 
v F v = 6 )F W , 

0,v-l \V + 1 — WJ 

and remarked [6, § 154] that this holds for both negative and positive n. Ettingshausen had 
related the F coefficients to sums of products of combinations with and without repetition; thus he 
implicitly confirmed (2.9). 

The first person to attach Stirling's name to the numbers we now call Stirling numbers was 
Niels Nielsen in 1904 [40]; he said that this new nomenclature had been suggested to him by T. N. 
Thiele. (The numbers may have been studied before Stirling's time; for example, I once found the 
values of [£] for 1 < n < 7 in some unpublished manuscripts of Thomas Harriot, dating from about 
1600, in the British Museum [26, page 241]. But Stirling almost surely deserves the credit for being 
first to deduce nontrivial facts about [?] and {£}•) 

Nielsen wrote for [ , which he called a "Stirling number of rank n"; and he wrote £^ for 
j™^" 1 }, which he called a "Stirling number of rank — n." (He should really have defined its rank 
to be 1 — ri). In equation (41) of his paper, Nielsen obtained a rigorous proof of the duality law (2.4); 
but he had to state it in a peculiar way, because he had defined and £jj only for nonnegative n 
and k. Thus, he could not write = €i_ n ; he had to say instead that fk(n) = gk(l — n), where 
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/fc(n) and gk(n) were the polynomials denned by C k and <t^. Tweedie [54] expressed (2.4) with 
similar circumlocutions. 

When Jordan took up Stirling numbers [22], he wrote for (-l) n ~ k [™] and 6^ for {"}. He 
does not seem to have known the duality law (2.4), probably because he had learned about Stirling 
numbers from Nielsen's book [41], which omitted some of the details in Nielsen's paper [40]. And 
as far as I know, the duality law largely disappeared from mathematicians' collective consciousness 
during most of the twentieth century; it seems to have been mentioned explicitly only in a few 
scattered places: (1) Hansraj Gupta, "working in a small township away from what was then the 
only University in the Panjab" [18, page 5], rediscovered Stirling numbers and Stirling duality by 
himself, in the early 1930s. This became part of his Ph.D. dissertation [17], and he included it in 
a book on number theory prepared many years later [18, Chapter 5]. (2) H. W. Gould [12] was 
probably the first twentieth-century mathematician to observe that we can use the polynomials 
[ and { n ™ fc } to extend the domain of Stirling numbers to negative values of n. Gould's way of 
writing (2.4) was Si(—n — l,k) = S 2 {n, k); and shortly thereafter [13], he mentioned the equivalent 
formula 

s-_ n k =(-i) n - k & k n , 

in Jordan's notation. (3) R. V. Parker [42], like Gupta, displayed both of Stirling's triangles 
in tandem, presenting them in a single table as Logan later did. (4) In 1976, Ira Gessel and 
Richard Stanley investigated some of the deeper structure underlying the Stirling polynomials 
fk(n) = { n ^ k } and g k (n) = [ n ™ J . They noted in particular [11, equation (3)] that fk(-n) = gk(n). 
This fact is equivalent to the duality law (2.4). 

Stanley had discovered a beautiful theorem in his Ph.D. thesis a few years earlier [50, Propos- 
tion 13.2(i)], now called the reciprocity theorem for order polynomials: If P is any finite partially 
ordered set, let Q(P, n) be the number of order-preserving mappings from P into the totally ordered 
set {1,2,..., n}; and let Q(P, n) be the number of such mappings that are strictly order-preserving. 
Thus, if x ~< y in P, the mappings / enumerated by £l(P, n) must satisfy f(x) < f(y), and the map- 
pings g enumerated by £l(P,n) must satisfy g(x) < g(y). Stanley's theorem states that, in general, 
we have /(— n) = (— l) p g(n), where p is the number of elements of P. For example, if P consists of 
p isolated points with no order constraints whatever, we have £l(P,n) = Q(P,n) = n p . And if the 
points of P are themselves totally ordered, then Q(P,n) is the number of combinations 

of n things p at a time with repetitions permitted, and fi(P, n) is (™), the combinations without 
repetition. In both cases we have fl(P, —n) = (— l) p Q(P,n). 

I showed Stanley the first draft of this note and asked him whether the Stirling duality law 
(2.4) could be derived as a special case of his general reciprocity law. Sure enough, he replied that 
Gessel had noticed a simple way to do exactly that, shortly after the paper [11] was written. Let 
Pk be the partial order on 2k points typified by 



P 4 = 
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then 



and 



tt(P k ,n)= ^ [xi < ■ ■ ■ < x k ][x 1 >yx] ... [x k >y k ] 

l<xi,...,x k ,yi,...,y k <n 

= E [xi < ■■■ <x k ]xi ... X k , 

l<xi ,. . .,x k <n 

^2 [x 1 < ■■■ < z fe ][xi >y x ] ... [x k > y k ] 

l<x 1 ,...,x k ,y 1 ,...,y k <n 

^2 [xi < ■■■ < x k ](xi -1) ... (x k -l) 

2<x±,...,x k <n 



n(P k ,n) 



Kxi 



E [ xi < 

...,x k <n— 1 



< X k ]xi . . . X k . 



Thus the sums are respectively T k (n) and C k (n — 1); by (2.6) we have Q(P kj n) = { n + } and 
Cl(P k ,n) = hence (2.4) is indeed an instance of Stanley's theorem. 

Now we are ready to discuss the second reason why I became convinced that is the right 
symbolism for these coefficients after I had translated Logan's memo [34] into that notation: We 
know that [ is a polynomial in n, when k is an integer; hence, as Kramp knew, we can sensibly 
define the quantity [ for arbitrary complex a and integer k, using that same polynomial. 
Then — and here comes the punch line — Logan noticed that the fundamental equations (2.17) and 
(2.18) generalize to asymptotic formulas, valid for arbitrary exponents a: If z — > oo and if m is any 
nonnegative integer, we have 



fc=0 L 
m 



fc=0 



a 

a — k 
a 

a — k 



z a ~ k + Oiz*-™- 1 ) ; 
(-l) k z a ~ k + 0(z a - m ~ 1 ). 



(2.25) 
(2.26) 



(See [15, exercise 9.44]; equation (2.25) is a correct way to formulate Kramp's divergent series (2.23). 
These equations are special cases of a still more general result proved by Tricomi and Erdelyi [53, 9].) 
The easily remembered expansions in (2.25) and (2.26) were quite a revelation to me. I had often 
spent time laboriously calculating approximations to ratios such as z 1 / 2 = T(z + l/2)/T(z), the 
hard way: I took logarithms, then used Stirling's approximation, and then took exponentials. But 
equations (2.25) and (2.26) produce the answer directly. 

Moreover Stirling's original identity (2.10) can be generalized in a similar way: If a is any 
complex number, we have 



E 



a 

a — k 



^a — k 



Re(z) > . 



(2.27) 



When I wrote the first draft of this note, I knew only that the series (2.27) was convergent, and that 
it was asymptotically correct as z — > oo; so I conjectured that equality might hold. Soon afterward, 
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B. F. Logan found the following proof (although he naturally stated it in his own notation): Suppose 
first that Re(a) < 1. Then we have the well known identity 



oc 



r(i - a) Jo 
and we can substitute e~* = 1 — u to get 



z"" 1 = — - / e- zt t~ a dt, Re(z)>0, (2.28) 



z a-i 



= — f (1 -uY^u-" ( - In — !— \ du. 

r(l - a) J V ' \u 1-uJ 



Now it turns out that the powers of - In t^— generate the Stirling numbers { a _ A = \ _ a ] , in the 
sense that 

(u ln T^) = E { a _ J {k _ a) ... (1 _ Q) ' ( 2 - 29 ) 

a series that converges for \u\ < 1 (see [15, equations (6.45), (6.53), (7.50)]). Therefore 

«- = E{ o : i } iTr ^jf<i-.r^* 

E( a \ r(z + 1) ^ / a \ z ' 
\q - fcj f(z + 1 + k - a) ~ \ a _ k) (z + k - a)\ ' 

and (2.27) is verified when Re(a) < 1. To complete the proof, we need only show that (2.27) holds 
for a + 1 if it holds for a; but this is easy, because 



a — k 



z-z^- 



?{ a !J^ + ?{ a+ l-J^ +1 -^ 



= E 



a + 1 — fc 



by the basic recurrence equation (2.3). 

Notice that in all of the general identities (2.25)-(2.27), as in the original formulas (2.10), 
(2.17), and (2.18) that inspired them, the lower index within the braces or brackets is the same 
as the exponent of z. This makes the relations easy to remember, by analogy with the binomial 
theorem 

(1 + z) a = Q * k , when \z\<l. (2.30) 

Some readers will have been thinking, "This all looks fairly plausible, but unfortunately Knuth 
is overlooking a key point that ruins the whole proposal: We can't use the notation [£] for Stirling 
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numbers, because it has already been used for more than a century as the standard notation for 
Gauss's generalized binomial coefficients." 

Well, there is a down side to every good idea, but this objection is not really severe. For 
one thing, the standard notation for Gaussian binomial coefficients involves a hidden parameter q, 
and it's not unusual for modern researchers to make transformations that change q. Therefore 
Gauss's notation is incomplete, and Andrews (for example) has used the notation [™] 2 for the 
Gaussian coefficient with q 2 as the hidden parameter [2, page 49]. Such examples suggest that it is 
appropriate to denote Gaussian binomials as (™) , especially since they reduce to ordinary binomials 
when 5=1. This notation also generalizes nicely to such things as Fibonomial coefficients (^)gr; 
see [27]. We can then reserve the notation [^] for a (/-generalization of [^] . (The reverse strategy 
was unfortunately adopted in [14].) Secondly, I do not believe that any existing mathematical 
works, including books like [2] which use Gaussian coefficients extensively, would become seriously 
cluttered if the Gaussian [™] were changed everywhere to . Even so, such changes are not 
necessary; there is obviously no harm in beginning a mathematical paper or a book chapter or an 
entire book with a statement to the effect that "[£] will denote a Gaussian binomial coefficient 
with parameter q in what follows." All notation can be redefined for special purposes. Therefore 
Stirling number enthusiasts are not encroaching on Gaussian territory when they write [^] , if they 
also mumble something about Stirling in order to set the context. 

One further point is worth noting in conclusion: As soon as the notations [£] and/or are 
adopted, there will no longer be a need to speak about Stirling numbers "of the first and second 
kind," except as a concession to history. Nielsen wrote a superb book [41], but he did the world a 
disservice by originating the Erster Art and Zweier Art terminology, because that terminology has 
no mnemonic value and is historically inaccurate. Stirling introduced the numbers {^} first and 
brought in [£] second. Indeed, practical applications have always tended to involve the numbers 
{^} much more often than their [?] counterparts. It seems far better to speak of {^} as a Stirling 
subset number, and to call [£] a Stirling cycle number. Then the names are tied to intuitive, 
student-friendly concepts, not to arbitrary and offputting concepts of the kth kind. 

Acknowledgments. I am extremely grateful for comments received from John Ewing, Philippe 
Flajolet, Adriano Garsia, B. F. Logan, Andrew Odlyzko, Richard Stanley, and H. S. Wilf, without 
which these notes would have been substantially poorer. 
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Note to printer: A few special symbols are used herein. 



S is uppercase script S 
£ is uppercase Fraktur C 
S is uppercase Fraktur S 
A is uppercase script A 
3~ is uppercase script F 
01 is uppercase script R 
i is lowercase Fraktur k 



